Marks: 60
The number of restaurants in New York is increasing day by day. Lots of students and busy professionals rely on those restaurants due to their hectic lifestyles. Online food delivery service is a great option for them. It provides them with good food from their favorite restaurants. A food aggregator company FoodHub offers access to multiple restaurants through a single smartphone app.
The app allows the restaurants to receive a direct online order from a customer. The app assigns a delivery person from the company to pick up the order after it is confirmed by the restaurant. The delivery person then uses the map to reach the restaurant and waits for the food package. Once the food package is handed over to the delivery person, he/she confirms the pick-up in the app and travels to the customer's location to deliver the food. The delivery person confirms the drop-off in the app after delivering the food package to the customer. The customer can rate the order in the app. The food aggregator earns money by collecting a fixed margin of the delivery order from the restaurants.
The food aggregator company has stored the data of the different orders made by the registered customers in their online portal. They want to analyze the data to get a fair idea about the demand of different restaurants which will help them in enhancing their customer experience. Suppose you are hired as a Data Scientist in this company and the Data Science team has shared some of the key questions that need to be answered. Perform the data analysis to find answers to these questions that will help the company to improve the business.
The data contains the different data related to a food order. The detailed data dictionary is given below.
# import libraries for data manipulation
import numpy as np
import pandas as pd
# import libraries for data visualization
import matplotlib.pyplot as plt
import seaborn as sns
# importing plotly
import plotly.express as px
# to restrict the float value to 3 decimal places
pd.set_option('display.float_format', lambda x: '%.3f' % x)
import warnings
warnings.filterwarnings('ignore')
# read the data
df = pd.read_csv('foodhub_order.csv')
# returns the first 5 rows
df.head()
| order_id | customer_id | restaurant_name | cuisine_type | cost_of_the_order | day_of_the_week | rating | food_preparation_time | delivery_time | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1477147 | 337525 | Hangawi | Korean | 30.750 | Weekend | Not given | 25 | 20 |
| 1 | 1477685 | 358141 | Blue Ribbon Sushi Izakaya | Japanese | 12.080 | Weekend | Not given | 25 | 23 |
| 2 | 1477070 | 66393 | Cafe Habana | Mexican | 12.230 | Weekday | 5 | 23 | 28 |
| 3 | 1477334 | 106968 | Blue Ribbon Fried Chicken | American | 29.200 | Weekend | 3 | 25 | 15 |
| 4 | 1478249 | 76942 | Dirty Bird to Go | American | 11.590 | Weekday | 4 | 25 | 24 |
Per above, the DataFrame has 9 columns. Columns for each row reprsents the attributes of the order placed by a customer to the food aggregator company
# Write your code here
print("There are", df.shape[0], 'rows and', df.shape[1], "columns.")
There are 1898 rows and 9 columns.
# Use info() to print a concise summary of the DataFrame
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1898 entries, 0 to 1897 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 order_id 1898 non-null int64 1 customer_id 1898 non-null int64 2 restaurant_name 1898 non-null object 3 cuisine_type 1898 non-null object 4 cost_of_the_order 1898 non-null float64 5 day_of_the_week 1898 non-null object 6 rating 1898 non-null object 7 food_preparation_time 1898 non-null int64 8 delivery_time 1898 non-null int64 dtypes: float64(1), int64(4), object(4) memory usage: 133.6+ KB
# Write your code here
# checking missing values
df.isnull().sum()
order_id 0 customer_id 0 restaurant_name 0 cuisine_type 0 cost_of_the_order 0 day_of_the_week 0 rating 0 food_preparation_time 0 delivery_time 0 dtype: int64
#1 Filter the rated restaurants
df_rated = df[df['rating'] != 'Not given'].copy()
#2 convert rating column from object to integer
df_rated['rating'] = df_rated['rating'].astype('int')
# Write your code here
df.describe(include= 'all').T
| count | unique | top | freq | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| order_id | 1898.000 | NaN | NaN | NaN | 1477495.500 | 548.050 | 1476547.000 | 1477021.250 | 1477495.500 | 1477969.750 | 1478444.000 |
| customer_id | 1898.000 | NaN | NaN | NaN | 171168.478 | 113698.140 | 1311.000 | 77787.750 | 128600.000 | 270525.000 | 405334.000 |
| restaurant_name | 1898 | 178 | Shake Shack | 219 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| cuisine_type | 1898 | 14 | American | 584 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| cost_of_the_order | 1898.000 | NaN | NaN | NaN | 16.499 | 7.484 | 4.470 | 12.080 | 14.140 | 22.297 | 35.410 |
| day_of_the_week | 1898 | 2 | Weekend | 1351 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| rating | 1898 | 4 | Not given | 736 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| food_preparation_time | 1898.000 | NaN | NaN | NaN | 27.372 | 4.632 | 20.000 | 23.000 | 27.000 | 31.000 | 35.000 |
| delivery_time | 1898.000 | NaN | NaN | NaN | 24.162 | 4.973 | 15.000 | 20.000 | 25.000 | 28.000 | 33.000 |
# Write the code her
unrated_count = len(df[df['rating']=='Not given'])
unrated_percent = round(100*unrated_count/df.shape[0],2)
print(f'Out of a total of {df.shape[0]} orders, {unrated_count}, which is about {unrated_percent}%, are not rated' )
Out of a total of 1898 orders, 736, which is about 38.78%, are not rated
df['rating'].unique()
array(['Not given', '5', '3', '4'], dtype=object)
736 (~38.8%) Orders are not rated. The unique() function is called to double check to ensure that was no other way the absence of rating was reported
# Write the code here
df['restaurant_name'].nunique()
178
There were 178 unique resutaurants where orders were placed, lets see find out out how they stake in terms of number of times they show up in the dataset
plt.figure(figsize=(75,55))
fig = px.histogram(df, x='restaurant_name', color="cuisine_type", labels={
"restaurant_name": "Restaurant Name",
"count":"count"
},
title="Orders per Restaurant").update_xaxes(categoryorder='total descending')
fig.show()
<Figure size 7500x5500 with 0 Axes>
print(f"There are a total of {df['cuisine_type'].nunique()} unique cuisine types:")
There are a total of 14 unique cuisine types:
List of Unique Cuisine Types are as follows:
print(df['cuisine_type'].unique())
['Korean' 'Japanese' 'Mexican' 'American' 'Indian' 'Italian' 'Mediterranean' 'Chinese' 'Middle Eastern' 'Thai' 'Southern' 'French' 'Spanish' 'Vietnamese']
df['cuisine_type'].value_counts()
American 584 Japanese 470 Italian 298 Chinese 215 Mexican 77 Indian 73 Middle Eastern 49 Mediterranean 46 Thai 19 French 18 Southern 17 Korean 13 Spanish 12 Vietnamese 7 Name: cuisine_type, dtype: int64
plt.figure(figsize=(12,8))
sns.countplot(data=df, x='cuisine_type', order = df['cuisine_type'].value_counts().index)
plt.xticks(rotation=60)
plt.xlabel('Cuisine Type')
plt.ylabel('Order Count');
plt.figure(figsize=(12,8))
sns.histplot(data=df, x='cost_of_the_order', kde=True, bins=35)
plt.show()
The tallest bin ranges between \$12 & \$13, indicating the range of order cost with higherst number of orders
sns.boxplot(data=df,x='cost_of_the_order')
plt.show()
From the box plot, it can be observed that the order cost distribution is right skewed & there are No outliers. The median cost is around \$14 and the 1st quartlie is around \$12 and 3rd quartile is around \$22.5. The inner quartile range is 22.5-12 ~ \$10. Min is around \$4 dollars and max is around \$35
plt.figure(figsize=(12,8))
sns.countplot(data=df, x='day_of_the_week')
<Axes: xlabel='day_of_the_week', ylabel='count'>
The orders on Weekends are routhly 2.5 times of the Weekday orders. Since folks during weekdays tend to go for something quick, reducing the overall delivery time can positively contribute to Weekay order counts
print(f"There are a total of {df['rating'].nunique()} unique Rating Entries in the data set:")
There are a total of 4 unique Rating Entries in the data set:
ratingentries = df['rating'].unique()
print("List of Unique Rating Entries are", ratingentries)
List of Unique Rating Entries are ['Not given' '5' '3' '4']
plt.figure(figsize=(12,8))
sns.countplot(data=df, x='rating',order = df['rating'].value_counts().index);
plt.figure(figsize=(12,8))
sns.histplot(data=df, x='food_preparation_time', kde=True)
plt.xlabel('Food Prep. Time')
plt.title('Distribution of Food Preparation Time')
plt.show()
There appears to be no obvious pattern on Food Prep time by itself. Further analysis in conjunction with other columns may provide some insight
sns.boxplot(data=df,x='food_preparation_time')
plt.xlabel('Food Preparation time')
plt.show()
Median food preparation time is ~27 minutes. Max is 35 minutes. Min is 20 minutes. Data Inter Quartile Range is about 31-23 = 8 minutes
plt.figure(figsize=(12,8))
sns.histplot(data=df, x='delivery_time', kde=True)
plt.xlabel('Delivery Time')
plt.title('Delivery Time Distribution')
plt.show()
Large number of orders are getting delivered between around ~24 to ~29 minutes
sns.boxplot(data=df,x='delivery_time')
plt.xlabel('Delivery Time')
plt.show()
Median delivery time is 25 mins, min = 15 mins and max = 33 mins. The IQR is 7.5 mins. Food delivery time is left skewed & there are No outliers
df['order_id'].nunique()
1898
Since The number of Unique Order IDs is same as the total number of rows, this dataset only contains unquie orders so earch row exactly represents a unique order and there are No duplicate orders present
df['customer_id'].nunique()
1200
#Lets check how is the distribution of order counts of these 1200 unique customers are
df_cust_id__order_freq_count = df.groupby(['customer_id'] , as_index=False)['customer_id'].value_counts()
df_cust_id__order_freq_count.groupby(['count'])['count'].value_counts()
count count 1 1 784 2 2 267 3 3 83 4 4 33 5 5 19 6 6 6 7 7 4 8 8 1 9 9 1 10 10 1 13 13 1 Name: count, dtype: int64
plt.figure(figsize=(12,8))
sns.countplot(data=df_cust_id__order_freq_count, x='count');
plt.title('Distribution of Unique Customer Order Frequency')
plt.xlabel('Unique Customer Occurrences')
plt.ylabel('Order Count');
The fact that 784 customers only ordered once needs to be examined & addressed
# Write the code here
top5 = df.groupby(['restaurant_name'])['restaurant_name'].value_counts().sort_values(ascending=False)[:5]
top5
restaurant_name restaurant_name Shake Shack Shake Shack 219 The Meatball Shop The Meatball Shop 132 Blue Ribbon Sushi Blue Ribbon Sushi 119 Blue Ribbon Fried Chicken Blue Ribbon Fried Chicken 96 Parm Parm 68 Name: restaurant_name, dtype: int64
# We can cross check the above, by hovering mouse over the first 5 highest counts in the below histogram
plt.figure(figsize=(75,55))
fig = px.histogram(df, x='restaurant_name', color="restaurant_name", labels={
"restaurant_name": "Restaurant Name",
"count":"count"
},
title="Orders per Restaurant").update_xaxes(categoryorder='total descending')
fig.show();
<Figure size 7500x5500 with 0 Axes>
American
# Write the code here
df_weekend = df[df['day_of_the_week'] == 'Weekend']
df_weekend
df_weekend.groupby(['cuisine_type'])['cuisine_type'].value_counts().sort_values(ascending=False)[:1]
cuisine_type cuisine_type American American 415 Name: cuisine_type, dtype: int64
29.24%
# Write the code here
total_num_of_orders_over_20_dollars = df.loc[df['cost_of_the_order'] > 20.0].count().iloc[0]
percent_of_orders_over_20_dollars = round(100 * total_num_of_orders_over_20_dollars / df.shape[0],2)
percent_of_orders_over_20_dollars
29.24
# Write the code here
print('The mean delivery time for this dataset is', round(df['delivery_time'].mean(),2), 'minutes')
The mean delivery time for this dataset is 24.16 minutes
24.16 Minutes
# Write the code here
Top3Customers = df.groupby(['customer_id'])['customer_id'].value_counts().sort_values(ascending=False).head(3)
Top3Customers
customer_id customer_id 52832 52832 13 47440 47440 10 83287 83287 9 Name: customer_id, dtype: int64
plt.figure(figsize=(12,8))
sns.countplot(data=df, x='cuisine_type', hue='day_of_the_week', order = df['cuisine_type'].value_counts().index)
plt.xticks(rotation=60)
plt.legend(title='Day of the week')
plt.tight_layout()
plt.xlabel('Cuisine Type')
plt.ylabel('Order Count')
plt.show();
Observation
# Relationship between cost of the order and cuisine type
plt.figure(figsize=(15,7))
fig=px.box(df, x = "cuisine_type", y = "cost_of_the_order", color="cuisine_type", labels={
"cuisine_type": "Cuisine Type",
"cost_of_the_order":"cost of the order"
},
title="Order Cost By Cuisine Type")
fig.show()
<Figure size 1500x700 with 0 Axes>
Observation
Mexican, Indian, and Middle Eastern have low order counts relative to the "big 4" - Japanese, American, Italian, and Chinese. Japanese, Mexican, American, Indian, Italian, Chinese, and Middle Eastern have similar order cost distributions Korean, Mediterranean, and Vietnamese have low order counts & all have outliers
Lets analyse food prep time per cusine type
plt.figure(figsize=(20,15))
fig = px.box(df, x="food_preparation_time", y="cuisine_type", color="cuisine_type", labels={
"food_preparation_time": "food prep time",
"cuisine_type": "Cuisine Type" },
title="Cusine Type & Food Prep Time")
fig.show();
<Figure size 2000x1500 with 0 Axes>
Observation
Lets analyse food delivery time per cusine type
plt.figure(figsize=(20,15))
fig = px.box(df, x="delivery_time", y="cuisine_type", color="cuisine_type", labels={
"delivery_time": "Delivery Time",
"cuisine_type": "Cuisine Type" },
title="Cusine Type & Delivery Time")
fig.show();
<Figure size 2000x1500 with 0 Axes>
Observation
# Relationship between rating and cuisine type
plt.figure(figsize=(20,15))
fig = px.box(df_rated, x="cuisine_type", y="rating", color="cuisine_type", labels={
"cuisine_type": "cuisine_type",
"rating": "rating" },
title="Rating by Cusine Type")
fig.show();
<Figure size 2000x1500 with 0 Axes>
Observation
With the exception of Mediterranean, Spanish, Korean & Vietnamese, all others follow similar distribution Vietnamese is the only one having its IQR of ratings spead the widest from 3 to 5
# Relationship between day of the week and delivery time
plt.figure(figsize=(20,15))
fig = px.box(df, x="day_of_the_week", y="delivery_time", color="day_of_the_week", labels={
"delivery_time": "delivery time",
"day_of_the_week":"day of the week"
},
title="Delivery times by day of the Week")
fig.show();
<Figure size 2000x1500 with 0 Axes>
Weekday has more delivery times, possilbly due to heavier weekday traffic compared to weekends. Howerver, the IRQ for weekdays are shorter than that of weekend, pointing to somewhat more variability in weekend traffic compared to weekdays
# Relationship between rating and food preparation time
plt.figure(figsize=(15, 7))
sns.pointplot(data=df_rated, x='rating', y='food_preparation_time')
plt.show();
It is understandble to see that higher food preparation times correspond to lower order ratings
Orders with rating 3 has a huge variation in food prep time. Perhaps it is that variation that is contributing to lower ratting, possibly not meeting customers expectations compared to those with rating 4 & 5
# Relationship between rating and cost of the order
plt.figure(figsize=(15, 7))
sns.pointplot(data=df_rated, y='cost_of_the_order', x='rating')
plt.show()
# Plot the heatmap
col_list = ['cost_of_the_order', 'food_preparation_time', 'delivery_time']
plt.figure(figsize=(15, 7))
sns.heatmap(df[col_list].corr(), annot=True, vmin=-1, vmax=1, fmt=".2f", cmap="Spectral")
plt.show()
Per the above heatmaps, there appears to be no correlation between the numeric data points
# let's create a column with a sum of food preparation time and the food delivery time to get to the total ourder fulfillment time
df_rated['order_fulfillement_time']= df_rated['food_preparation_time'] + df_rated['delivery_time']
df.head()
| order_id | customer_id | restaurant_name | cuisine_type | cost_of_the_order | day_of_the_week | rating | food_preparation_time | delivery_time | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1477147 | 337525 | Hangawi | Korean | 30.750 | Weekend | Not given | 25 | 20 |
| 1 | 1477685 | 358141 | Blue Ribbon Sushi Izakaya | Japanese | 12.080 | Weekend | Not given | 25 | 23 |
| 2 | 1477070 | 66393 | Cafe Habana | Mexican | 12.230 | Weekday | 5 | 23 | 28 |
| 3 | 1477334 | 106968 | Blue Ribbon Fried Chicken | American | 29.200 | Weekend | 3 | 25 | 15 |
| 4 | 1478249 | 76942 | Dirty Bird to Go | American | 11.590 | Weekday | 4 | 25 | 24 |
plt.figure(figsize=(10,5))
sns.pointplot(data=df_rated,y='order_fulfillement_time',x='rating')
plt.show()
Observation:
Lets come up with some derived numbers based on following assumptions:
Order Fulfillment time = prep time + delivery time
Analyse the the relation of delivery time to other data points to see if there is an opportunity to make some decisions based on delivery times such as reduce cost & / or improve rating
While we don't have data, batching of deliveries from same restaurants need to be availed
Look for opportunities to batch deliveries from same restaurants.
With knowledge of order cost per delivery time, an informed decistion can be made as to what's the minimum order cost allowed to use this service
Important order metrics
cost_per_delivery_time = order cost / delivery time (\$/minute)
Prep time: Calculated per restaurant, per cuisine, overall
Used to prevent deliverers from having to wait for delivery to be ready. Order ratio = # orders / restaurant(s) (greater is better) Can be calculated per restaurant, per cuisine, overall Used to identify busier restaurants where deliveries might be batched.
# create new df with added order_fulfillment_time and cost_per_delivery_time columns
order_analytics_df = df
order_analytics_df['order_fulfillment_time'] = order_analytics_df['food_preparation_time']+order_analytics_df['delivery_time']
order_analytics_df['cost_per_delivery_time'] = order_analytics_df['cost_of_the_order']/order_analytics_df['delivery_time']
order_analytics_df.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| order_id | 1898.000 | 1477495.500 | 548.050 | 1476547.000 | 1477021.250 | 1477495.500 | 1477969.750 | 1478444.000 |
| customer_id | 1898.000 | 171168.478 | 113698.140 | 1311.000 | 77787.750 | 128600.000 | 270525.000 | 405334.000 |
| cost_of_the_order | 1898.000 | 16.499 | 7.484 | 4.470 | 12.080 | 14.140 | 22.297 | 35.410 |
| food_preparation_time | 1898.000 | 27.372 | 4.632 | 20.000 | 23.000 | 27.000 | 31.000 | 35.000 |
| delivery_time | 1898.000 | 24.162 | 4.973 | 15.000 | 20.000 | 25.000 | 28.000 | 33.000 |
| order_fulfillment_time | 1898.000 | 51.534 | 6.834 | 35.000 | 47.000 | 52.000 | 56.000 | 68.000 |
| cost_per_delivery_time | 1898.000 | 0.719 | 0.381 | 0.153 | 0.451 | 0.614 | 0.918 | 2.202 |
Observations:
#sns.histplot(data=order_analytics_df,x='cuisine_type',y="cost_per_delivery_time", color='cusine_type')
#plt.xticks(rotation=90);
fig = px.density_heatmap(
data_frame=order_analytics_df, x="cuisine_type", y="cost_per_delivery_time", color_continuous_scale="PuBu", labels={
"cuisine_type": "cuisine type",
"cost_per_delivery_time":"cost per delivery time"
},
title="cost per delivery time by cuisine type")
fig.show()
order_analytics_df.sort_values(by='cost_per_delivery_time', ascending=True, inplace=True)
plt.figure(figsize=(20,15))
fig = px.box(order_analytics_df, x="cost_per_delivery_time", y="cuisine_type", color="cuisine_type", labels={
"cost_per_delivery_time": "cost per delivery time",
"cuisine_type": "Cuisine Type" },
title="Cost Per Delivery by Cusine Type")
fig.show();
<Figure size 2000x1500 with 0 Axes>
Observations:
Revenue & Order Profitability
# Lets check how many orders were 5 or below where company has no revenue, meaning its losing money for those orders, given the cost to deliver is there
orders_with_zero_revenue = len(df[df['cost_of_the_order'] < 6 ])
print(f'{orders_with_zero_revenue} orders were less than $6, contributing to 0 revenue')
revenue_with_cost_6_to_10 = df[ (df['cost_of_the_order'] < 11) & (df['cost_of_the_order'] > 5) ] ['revenue'].sum()
order_count_with_cost_6_to_10 = len(df[ (df['cost_of_the_order'] < 11) & (df['cost_of_the_order'] > 5) ])
revenue_per_order_with_cost_6_10 = round(revenue_with_cost_6_to_10 / order_count_with_cost_6_to_10,2)
print(f'{order_count_with_cost_6_to_10} orders were between $6 & $10, with revenue per order at: ${revenue_per_order_with_cost_6_10}')
revenue_with_cost_11_to_20 = df[ (df['cost_of_the_order'] < 21) & (df['cost_of_the_order'] > 10) ] ['revenue'].sum()
order_count_with_cost_11_to_20 = len(df[ (df['cost_of_the_order'] < 21) & (df['cost_of_the_order'] > 10) ])
revenue_per_order_with_cost_11_20 = round(revenue_with_cost_11_to_20 / order_count_with_cost_11_to_20,2)
print(f'{order_count_with_cost_11_to_20} orders were between $11 & $20, with revenue per order at: ${revenue_per_order_with_cost_11_20}')
79 orders were less than $6, contributing to 0 revenue 395 orders were between $6 & $10, with revenue per order at: $3.24 972 orders were between $11 & $20, with revenue per order at: $3.15
# Write the code here
#1 Filter the rated restaurants
df_rated = df[df['rating'] != 'Not given'].copy()
#2 convert rating column from object to integer
df_rated['rating'] = df_rated['rating'].astype('int')
#3 create dataframe containing only the restaurant name & its corresponding rating counts & update the column label
df_rating_count = df_rated.groupby(['restaurant_name'])['rating'].count().sort_values(ascending = False).reset_index()
# update the column label to reflect that it now shows the rating count
df_rating_count.rename(columns = {'rating':'rating count'}, inplace = True)
#4 filter to get the data of restaurants that have rating count more than 50
rest_over_50_ratings = df_rating_count[df_rating_count['rating count']>50]['restaurant_name']
#5 Filter the rated data so that it only contains the restaurant names from prev steps
rest_data_with_over_50_ratings = df_rated[df_rated['restaurant_name'].isin(rest_over_50_ratings)].copy()
#6 Extract the names of the restaurants who mean ratings are over 4
rest_names_over_4_mean_rating = rest_data_with_over_50_ratings.groupby(rest_data_with_over_50_ratings['restaurant_name'])['rating'].mean().sort_values(ascending = False).reset_index()['restaurant_name']
print(f'Restaurants that have a rating count of more than 50 and the average rating greater than 4 are:\n{rest_names_over_4_mean_rating}')
Restaurants that have a rating count of more than 50 and the average rating greater than 4 are: 0 The Meatball Shop 1 Blue Ribbon Fried Chicken 2 Shake Shack 3 Blue Ribbon Sushi Name: restaurant_name, dtype: object
Net revenue generated by the company across all orders is $6,166
# Write the code here
#1 Following function returns the calculated revenue based on food cost
def calc_revenue(a):
if a > 20:
return a*0.25
elif a > 5:
return a*0.15
else:
return 0
#2 Extract a series that contains costs
costs = df['cost_of_the_order']
#3 Iterate through each row of the costs series
for i, cost_value in enumerate(costs):
#4 Populate a new Column "Revenue" with the calculated revenue for each row
df.loc[i,['revenue']] = calc_revenue(cost_value)
#5 Calculate the total revenue by taking the sum of "Revenue" column
net_revenue = df['revenue'].sum()
print(f"Net revenue generated by the company across all orders is ${net_revenue}")
Net revenue generated by the company across all orders is $6166.303
10.54% of orders take more than 60 minutes to get delivered
# Write the code here
# let's create a column with a sum of food preparation time and the food delivery time to get to the total order delivery time
df['total_order_delevery_time']= df['food_preparation_time'] + df['delivery_time']
df.head()
# Extract all the rows where the newly created "total_order_delevery_time" column shows value greater than 60
df_over_60 = df[df['total_order_delevery_time'] > 60]
# Now calcualte the requested number using the above dataset row counts with that of the original data set containing all rows
percent_over_60 = round(100*df_over_60.shape[0]/df.shape[0],2)
print(f'{percent_over_60}% of orders take more than 60 minutes to get delivered')
10.54% of orders take more than 60 minutes to get delivered
# Write the code here
fig = px.box(df, x="day_of_the_week", y="delivery_time", color="day_of_the_week", labels={
"delivery_time": "delivery time",
"day_of_the_week":"day of the week"
},
title="Delivery times by day of the Week")
fig.show()
mean_dt_weekend = round(df[df['day_of_the_week']=='Weekend']['delivery_time'].mean(),2)
mean_dt_weekday = round(df[df['day_of_the_week']=='Weekday']['delivery_time'].mean(),2)
mean_delivery_time_difference = round(mean_dt_weekday - mean_dt_weekend,2)
percentage_additional_mean_dt_weekday = round(100 * mean_delivery_time_difference / mean_dt_weekday)
percentage_weekday_delivery_time = mean_dt_weekday/mean_dt_weekend
print(f'The mean delivery times:\n Weekend: {mean_dt_weekend}\n Weekday: {mean_dt_weekday}\n')
print(f'On average, it takes about {mean_delivery_time_difference} more minutes to deliver food on Weekdays ({percentage_additional_mean_dt_weekday}% longer), as compared to Weekends')
The mean delivery times: Weekend: 22.47 Weekday: 28.34 On average, it takes about 5.87 more minutes to deliver food on Weekdays (21% longer), as compared to Weekends
The delivery times on Weekdays are about about 21% higher, on average, compared to that on Weekdays. The side by side box plot also shows the upward shift of the delivery times distribution for the Weekdays. The delivery time Interqaurtile Ragnes are:
Per above, the delivery times are spread much wider on Weekends than Weekdays.
Out of 1200 unique customers, 784 (65%) are one time customers, meaning they only used this service just once. i.e only 35% are repeat customers
There exists a major variatons in orders for cusine types that can be grouped as follows by order count:
Vietnamese (7)
Cuisines/Restaurants with higher order count tend to have higher average order costs, with mean order cost being: $16.50.
Revenue Per Order
972 orders were between 11 USD and 20 USD, with revenue per order at: 2.21 USD
Order Rating
Delivery times impact ratings
Operational Challenges
Improve Customer Loyality
If a one time customer does not return within an anticipated number of days (as learned from other repeat customers), generate an online incentive such as a discount coupon etc. to improve repeat customer count
Balancing out Cuisine Type Variety
Improve Revenue for low Cost Orders
Reduce Delivery Times